The purpose of this notebook is to demonstrate they key functionalities of pyLogit:
For description and formulation of the aforementioned asymmetric choice models, see
Brathwaite, Timothy, and Joan Walker. "Asymmetric, Closed-Form, Finite-Parameter Models of Multinomial Choice." arXiv preprint arXiv:1606.05900 (2016). http://arxiv.org/abs/1606.05900.The dataset being used for this example is the "Swissmetro" dataset used in the Python Biogeme examples. The data can be downloaded at http://biogeme.epfl.ch/examples_swissmetro.html, and a detailed explanation of the variables and data-collection procedure can be found at http://www.strc.ch/conferences/2001/bierlaire1.pdf.
In [1]:
from collections import OrderedDict # For recording the model specification
import pandas as pd # For file input/output
import numpy as np # For vectorized math operations
import pylogit as pl # For MNL model estimation and
# conversion from wide to long format
In [2]:
# Load the dataset
long_swiss_metro = pd.read_csv("../data/long_swiss_metro_data.csv")
In [3]:
# Look at the first 5 rows of the data
long_swiss_metro.head().T
Out[3]:
The model specification being used in this example is the following: $$ \begin{aligned} V_{i, \textrm{Train}} &= \textrm{ASC Train} + \\ &\quad \beta _{ \textrm{tt_transit} } \textrm{Travel Time} _{ \textrm{Train}} * \frac{1}{60} + \\ &\quad \beta _{ \textrm{tc_train} } \textrm{Travel Cost}_{\textrm{Train}} * \left( GA == 0 \right) * 0.01 + \\ &\quad \beta _{ \textrm{headway_train} } \textrm{Headway} _{\textrm{Train}} * \frac{1}{60} + \\ &\quad \beta _{ \textrm{survey} } \left( \textrm{Train Survey} == 1 \right) \\ \\ V_{i, \textrm{Swissmetro}} &= \textrm{ASC Swissmetro} + \\ &\quad \beta _{ \textrm{tt_transit} } \textrm{Travel Time} _{ \textrm{Swissmetro}} * \frac{1}{60} + \\ &\quad \beta _{ \textrm{tc_sm} } \textrm{Travel Cost}_{\textrm{Swissmetro}} * \left( GA == 0 \right) * 0.01 + \\ &\quad \beta _{ \textrm{headway_sm} } \textrm{Heaway} _{\textrm{Swissmetro}} * \frac{1}{60} + \\ &\quad \beta _{ \textrm{seat} } \left( \textrm{Seat Configuration} == 1 \right) \\ &\quad \beta _{ \textrm{survey} } \left( \textrm{Train Survey} == 1 \right) \\ &\quad \beta _{ \textrm{first_class} } \left( \textrm{First Class} == 0 \right) \\ \\ V_{i, \textrm{Car}} &= \beta _{ \textrm{tt_car} } \textrm{Travel Time} _{ \textrm{Car}} * \frac{1}{60} + \\ &\quad \beta _{ \textrm{tc_car}} \textrm{Travel Cost}_{\textrm{Car}} * 0.01 + \\ &\quad \beta _{\textrm{luggage}=1} \left( \textrm{Luggage} == 1 \right) + \\ &\quad \beta _{\textrm{luggage}>1} \left( \textrm{Luggage} > 1 \right) \end{aligned} $$
Note that packages such as mlogit and statsmodels do not, by default, handle coefficients that vary over some alternatives but not all, e.g. the travel time coefficient that is specified as being the same for "Train" and "Swissmetro" but different for "Car."
In [4]:
# NOTE: - Specification and variable names must be ordered dictionaries.
# - Keys should be variables within the long format dataframe.
# The sole exception to this is the "intercept" key.
# - For the specification dictionary, the values should be lists
# of integers or or lists of lists of integers. Within a list,
# or within the inner-most list, the integers should be the
# alternative ID's of the alternative whose utility specification
# the explanatory variable is entering. Lists of lists denote
# alternatives that will share a common coefficient for the variable
# in question.
basic_specification = OrderedDict()
basic_names = OrderedDict()
basic_specification["intercept"] = [1, 2]
basic_names["intercept"] = ['ASC Train',
'ASC Swissmetro']
basic_specification["travel_time_hrs"] = [[1, 2,], 3]
basic_names["travel_time_hrs"] = ['Travel Time, units:hrs (Train and Swissmetro)',
'Travel Time, units:hrs (Car)']
basic_specification["travel_cost_hundreth"] = [1, 2, 3]
basic_names["travel_cost_hundreth"] = ['Travel Cost * (Annual Pass == 0), units: 0.01 CHF (Train)',
'Travel Cost * (Annual Pass == 0), units: 0.01 CHF (Swissmetro)',
'Travel Cost, units: 0.01 CHF (Car)']
basic_specification["headway_hrs"] = [1, 2]
basic_names["headway_hrs"] = ["Headway, units:hrs, (Train)",
"Headway, units:hrs, (Swissmetro)"]
basic_specification["seat_configuration"] = [2]
basic_names["seat_configuration"] = ['Airline Seat Configuration, base=No (Swissmetro)']
basic_specification["train_survey"] = [[1, 2]]
basic_names["train_survey"] = ["Surveyed on a Train, base=No, (Train and Swissmetro)"]
basic_specification["regular_class"] = [1]
basic_names["regular_class"] = ["First Class == False, (Swissmetro)"]
basic_specification["single_luggage_piece"] = [3]
basic_names["single_luggage_piece"] = ["Number of Luggage Pieces == 1, (Car)"]
basic_specification["multiple_luggage_pieces"] = [3]
basic_names["multiple_luggage_pieces"] = ["Number of Luggage Pieces > 1, (Car)"]
In [5]:
# The 'custom_alt_id' is the name of a column to be created in the long-format data
# It will identify the alternative associated with each row.
custom_alt_id = "mode_id"
# Create a custom id column that ignores the fact that this is a
# panel/repeated-observations dataset. Note the +1 ensures the id's start at one.
obs_id_column = "custom_id"
# Create a variable recording the choice column
choice_column = "CHOICE"
In [6]:
# Estimate the multinomial logit model (MNL)
swissmetro_mnl = pl.create_choice_model(data=long_swiss_metro,
alt_id_col=custom_alt_id,
obs_id_col=obs_id_column,
choice_col=choice_column,
specification=basic_specification,
model_type="MNL",
names=basic_names)
# Specify the initial values and method for the optimization.
swissmetro_mnl.fit_mle(np.zeros(14))
# Look at the estimation results
swissmetro_mnl.get_statsmodels_summary()
Out[6]:
Note that in the models estimated below, we make use of PyLogit's ability to estimate "outside" intercept values. This means that while the index, $V_{ij} = X_{ij} \beta$ (where $j$ is an alternative), will be subjected to some transformation, the intercept (i.e. the Alternative Specifc Constant) will not be subject to this transformation. This is done by removing the intercept from the specification dictionaries and by using the init_intercepts and init_coefs keyword argument in the fit_mle() function of the various choice models. See below for examples.
In [7]:
# Create the various specification and name dictionaries
# for the asymmetric logit model. It is the same as for the MNL
# except that it excludes the intercept from V_{i, alternative}.
asym_specification = OrderedDict()
asym_names = OrderedDict()
for col in basic_specification:
if col != "intercept":
asym_specification[col] = basic_specification[col]
asym_names[col] = basic_names[col]
# Get the list of intercept names for the asymmetric logit model
# This is used to tell PyLogit that we will be using 'outside'
# intercept parameters.
# See equation 2 of http://arxiv.org/abs/1606.05900 for more
# details
asym_intercept_names = basic_names["intercept"]
# Specify what alternative is not having its intercept estimated
asym_intercept_ref_pos = 2
# Give names to the shape parameters of the asymmetric logit model
# Note that all of the shape parameters are not identifiable so we
# need to restrict one of them. This accounts for why we do not have
# a "shape_car" name.
asym_shape_names = ["shape_train", "shape_swiss_metro"]
# Note the index of the alternative whose shape parameter is constrained
# (i.e. the Car alternative)
asym_ref = 2
swissmetro_asym = pl.create_choice_model(data=long_swiss_metro,
alt_id_col=custom_alt_id,
obs_id_col=obs_id_column,
choice_col=choice_column,
specification=asym_specification,
model_type="Asym",
names=asym_names,
shape_names=asym_shape_names,
intercept_names=asym_intercept_names,
shape_ref_pos=asym_ref,
intercept_ref_pos=asym_intercept_ref_pos)
# Also, note that we use None for initial values and use kwargs to
# specify our initial values for the optimization. This is necessary
# for the use of 'outside' intercept parameters.
# Note that dividing the index coefficients by log(3) accounts for
# the fact that when each shape parameter is 1/3, the value of the
# asymmetric logit model's estimated index coefficients are equal to
# the logit model's estimates, divided by log(3)
swissmetro_asym.fit_mle(None,
init_shapes=np.zeros(2),
init_intercepts=swissmetro_mnl.params.values[:2],
init_coefs=swissmetro_mnl.params.values[2:] / np.log(3),
method="bfgs")
# Look at the model results
swissmetro_asym.get_statsmodels_summary()
Out[7]:
In [8]:
# Create the various specification and name dictionaries
# for the uneven logit model. It is the same as for the MNL
# except that it excludes the intercept from V_{i, alternative}.
uneven_specification = OrderedDict()
uneven_names = OrderedDict()
for col in basic_specification:
if col != "intercept":
uneven_specification[col] = basic_specification[col]
uneven_names[col] = basic_names[col]
# Get the list of intercept names for the uneven logit model
# This is used to tell PyLogit that we will be using 'outside'
# intercept parameters.
# See equation 2 of http://arxiv.org/abs/1606.05900 for more
# details
uneven_intercept_names = basic_names["intercept"]
# Specify what alternative is not having its intercept estimated
uneven_intercept_ref_pos = 2
# Specify the names of the uneven logit model's shape parameters
# Note that we include "shape_car" because all of the uneven logit
# model's shape parameters are identifiable.
uneven_shape_names = ["shape_train", "shape_swiss_metro", "shape_car"]
swissmetro_uneven = pl.create_choice_model(data=long_swiss_metro,
alt_id_col=custom_alt_id,
obs_id_col=obs_id_column,
choice_col=choice_column,
specification=uneven_specification,
model_type="Uneven",
names=uneven_names,
shape_names=uneven_shape_names,
intercept_names=uneven_intercept_names,
intercept_ref_pos=uneven_intercept_ref_pos)
# Also, note that we use None for initial values and use kwargs to
# specify our initial values for the optimaztion. This is necessary
# to use 'outside' intercept parameters with the model.
swissmetro_uneven.fit_mle(None,
init_shapes=np.zeros(3),
init_intercepts=swissmetro_mnl.params.values[:2],
init_coefs=swissmetro_mnl.params.values[2:],
method="bfgs")
# Look at the model results
swissmetro_uneven.get_statsmodels_summary()
Out[8]:
In [9]:
# Create the various specification and name dictionaries
# for the scobit model. It is the same as for the MNL
# except that it excludes the intercept from V_{i, alternative}.
scobit_specification = OrderedDict()
scobit_names = OrderedDict()
for col in basic_specification:
if col != "intercept":
scobit_specification[col] = basic_specification[col]
scobit_names[col] = basic_names[col]
# Get the list of intercept names for the scobit model
# This is used to tell PyLogit that we will be using 'outside'
# intercept parameters.
# See equation 2 of http://arxiv.org/abs/1606.05900 for more
# details
scobit_intercept_names = basic_names["intercept"]
# Create the names of the shape parameters that are needed for the scobit model
scobit_shape_names = ["shape_train", "shape_swiss_metro", "shape_car"]
# Specify which intercept/ASC is not being estimated (namely, the Car intercept)
scobit_intercept_ref = 2
#Estimate the model
swissmetro_scobit = pl.create_choice_model(data=long_swiss_metro,
alt_id_col=custom_alt_id,
obs_id_col=obs_id_column,
choice_col=choice_column,
specification=scobit_specification,
model_type="Scobit",
names=scobit_names,
shape_names=scobit_shape_names,
intercept_ref_pos=scobit_intercept_ref,
intercept_names=scobit_intercept_names)
# Note that we are using 'outside' intercept parameters for this model
swissmetro_scobit.fit_mle(None,
init_shapes=np.zeros(3),
init_intercepts=swissmetro_mnl.params.values[:2],
init_coefs=swissmetro_mnl.params.values[2:],
method="BFGS",
maxiter=1200)
# Look at the model results
swissmetro_scobit.get_statsmodels_summary()
Out[9]:
In [10]:
# Create the specification and name dictionaries
# for the clog-log model.It is the same as for the MNL
# except that it excludes the intercept from V_{i, alternative}.
clog_specification = OrderedDict()
clog_names = OrderedDict()
# Copy the specification dictionary from the logit
# model, without the intercept parameters so we
# can place the intercept parameters outside the
# index
for col in basic_specification:
if col != "intercept":
clog_specification[col] = basic_specification[col]
clog_names[col] = basic_names[col]
# Get the list of intercept names for the clog-log model
# This is used to tell PyLogit that we will be using 'outside'
# intercept parameters.
# See equation 2 of http://arxiv.org/abs/1606.05900 for more
# details
clog_intercept_names = basic_names["intercept"]
# Specify which intercept/ASC is not being estimated
# (i.e, the Car intercept)
clog_intercept_ref = 2
# Estimate the Clog-log model based on the MNL model
# Note "intercept_ref_pos=2" means we are not estimating
# an intercept for the alternative whose alternative id
# falls at position 2 in the sorted array of alternative ids
swissmetro_clog = pl.create_choice_model(data=long_swiss_metro,
alt_id_col=custom_alt_id,
obs_id_col=obs_id_column,
choice_col=choice_column,
specification=clog_specification,
model_type="Cloglog",
intercept_ref_pos=clog_intercept_ref,
names=clog_names,
intercept_names=clog_intercept_names)
# Estimate the clog log model. Note we don't pass one single array of
# initial values but instead pass keyword arguments
swissmetro_clog.fit_mle(None,
init_intercepts=swissmetro_mnl.params.values[:2],
init_coefs=swissmetro_mnl.params.values[2:],
method='bfgs')
# Look at the model results
swissmetro_clog.get_statsmodels_summary()
Out[10]:
In [11]:
# Create a list of the estimated model objects
model_objs = [swissmetro_mnl,
swissmetro_clog,
swissmetro_asym,
swissmetro_uneven,
swissmetro_scobit]
log_likelihood_summary = pd.Series([x.log_likelihood for x in model_objs],
index=[x.model_type for x in model_objs],
name="Log-likelihood Summary")
log_likelihood_summary
Out[11]:
As shown above, for this dataset, one can improve (in terms of log-likelihood) on the estimates offered by the standard logit model by switching to an asymmetric choice model. However, you may sometimes pay a computatiaonal price in terms of ease of estimation since these choice models have non-concave log-likelihood functions. Additionally, you may pay a price in terms of variance-inflation of one's estimates beacause the "shape parameters" are not locally orthogonal to the index coefficients. See "Orthogonalizing Parametric Link Transformation Families in Binary Regression Analysis" by Claudia Czado and Thomas J. Santner, The Canadian Journal of Statistics, Vol. 20, No. 1 (Mar., 1992), pp. 51-61, http://www.jstor.org/stable/3315574.